legendre memory unit
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit~(LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving $d$ coupled ordinary differential equations~(ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree $d - 1$. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning $100\text{,}000$ time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales. We demonstrate that LMU memory cells can be implemented using $m$ recurrently-connected Poisson spiking neurons, $\mathcal{O}( m)$ time and memory, with error scaling as $\mathcal{O}( d / \sqrt{m})$. We discuss implementations of LMUs on analog and digital neuromorphic hardware.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Reviews: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
Originality: the use use of the Legendre polynomial seems rather creative, it was certainly important to define RNNs with good models of coupled linear units. Quality: The set of benchmarks is well chosen to describe a broad scope of qualities that RNN require. One non-artificial task would have been a plus though. What would have been even more important is to support the theory by controlling the importance of the initialization of the Matrices A and B. What if A was initialized with a clever diagonal (for instance the diagonal of A_bar)? As the architecture is already rather close to the one of NRU, one may wonder whether the architecture is not doing most of the job.
Reviews: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
This paper proposes a new memory layout for recurrent neural networks that is 1. theoretically grounded 2. allows for orders of magnitude longer memory than traditional approaches with comparable parameter cost The results are also confirmed experimentally. This work is definitely of interest to Neurips community and would be a great contribution to the conference.
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d - 1 . Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning 100\text{,}000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size.
Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware
Blouw, Peter, Malik, Gurshaant, Morcos, Benjamin, Voelker, Aaron R., Eliasmith, Chris
Keyword spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars. As KWS systems are typically 'always on', maximizing both accuracy and power efficiency are central to their utility. In this work we use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve state-of-the-art (SotA) accuracy and low parameter counts. This allows the neural network to run efficiently on standard hardware (212$\mu$W). We also characterize the power requirements of custom designed accelerator hardware that achieves SotA power efficiency of 8.79$\mu$W, beating general purpose low power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
Voelker, Aaron, Kajić, Ivana, Eliasmith, Chris
We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving $d$ coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree $d - 1$. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning $100\text{,}000$ time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size.
r/MachineLearning - [R] Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks (NeurIPS2019 Spotlight)
We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history--doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d 1. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time--exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales.